-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement first version of JMH microbenchmarks #18
Conversation
public static final List<BenchmarkObject> BENCHMARK_OBJECTS = | ||
ImmutableList.of( | ||
BenchmarkObject.builder() | ||
.keyName("random-1mb.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: can we have these key names defined as constants? They seem to be accessed from multiple files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I don't know how I didn't notice this :( I will address this in a next PR.
Just run `./gradlew jmh --rerun`. (The reason for re-run is a Gradle-quirk. You may want to re-run benchmarks even when | ||
you did not actually change the source of your project: `--rerun` turns off the Gradle optimisation that falls through | ||
build steps when nothing changed.) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have some script which can take bucket name/prefix as an command line argument and decide to create the bucket and generate the data if it does not exist and run the benchmark as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion. It will be very useful for new contributors too once this gets open sourced. For now, I will try to not block on it so created a backlog item for this.
Description of changes:
Now that we have a functional (correct but slow) implementation of S3SeekableStream, it makes sense to put monitoring over performance.
This PR implements basic microbenchmarks that test full sequential read, forward seeks, backward seeks and a Parquet-like ("jumping around") pattern. For now, we only compare against the performance of a single standard (=non-CRT) S3 async client. Using these benchmarks we can start implementing optimisations and get relatively quick feedback of what (if anything) they improved.
To run the microbenchmarks one has to assume AWS credentials and specify two environment variables (a BUCKET and a PREFIX). We include a generator utility so that setup is easy and this can later by open-sourced. The README is updated with instructions about running these benchmarks.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
An example output of a run looks like this: